Model-Based Offline Reinforcement Learning with Local Misspecification

نویسندگان

چکیده

We present a model-based offline reinforcement learning policy performance lower bound that explicitly captures dynamics model misspecification and distribution mismatch we propose an empirical algorithm for optimal selection. Theoretically, prove novel safe improvement theorem by establishing pessimism approximations to the value function. Our key insight is jointly consider selecting over models policies: as long can accurately represent of state-action pairs visited given policy, it possible approximate particular policy. analyze our in LQR setting also show competitive previous bounds on selection across set D4RL tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning: Model-based

Reinforcement learning (RL) refers to a wide range of dierent learning algorithms for improving a behavioral policy on the basis of numerical reward signals that serve as feedback. In its basic form, reinforcement learning bears striking resemblance to ‘operant conditioning’ in psychology and animal learning: actions that are rewarded tend to occur more frequently; actions that are punished ar...

متن کامل

Model-Based Reinforcement Learning

Reinforcement Learning (RL) refers to learning to behave optimally in a stochastic environment by taking actions and receiving rewards [1]. The environment is assumed Markovian in that there is a fixed probability of the next state given the current state and the agent’s action. The agent also receives an immediate reward based on the current state and the action. Models of the next-state distr...

متن کامل

Offline Evaluation of Online Reinforcement Learning Algorithms

In many real-world reinforcement learning problems, we have access to an existing dataset and would like to use it to evaluate various learning approaches. Typically, one would prefer not to deploy a fixed policy, but rather an algorithm that learns to improve its behavior as it gains more experience. Therefore, we seek to evaluate how a proposed algorithm learns in our environment, meaning we ...

متن کامل

Reinforcement Learning and Distributed Local Model Synthesis

Reinforcement learning is a general and powerful way to formulate complex learning problems and acquire good system behaviour. The goal of a reinforcement learning system is to maximize a long term sum of instantaneous rewards provided by a teacher. In its extremum form, reinforcement learning only requires that the teacher can provide a measure of success. This formulation does not require a t...

متن کامل

Model-based reinforcement learning with spiking neurons

Behavioural and neuroscientific data on reward-based decision making point to a fundamental distinction between habitual and goal-directed action selection. An increasingly explicit set of neuroscientific ideas has been established for habit formation, whereas goal-directed control has only recently started to attract researchers’ attention. While using functional magnetic resonance imaging to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i6.25903